Goto

Collaborating Authors

 Infrastructure & Services


EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints

Neural Information Processing Systems

Effective intersection control can play an important role in reducing traffic congestion and associated vehicular emissions. This is vitally needed in developing countries, where air pollution is reaching life threatening levels. This paper presents EcoLight intersection control for developing regions, where budget is constrained and network connectivity is very poor. EcoLight learns effective control offline using state-of-the-art Deep Reinforcement Learning methods, but deploys highly efficient runtime control algorithms on low cost embedded devices that work standalone on road without server connectivity. EcoLight optimizes both average case and worst case values of throughput, travel time and other metrics, as evaluated on open-source datasets from New York and on a custom developing region dataset.


OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction Yuntao Chen

Neural Information Processing Systems

In this paper, we propose OpenSatMap, a fine-grained, high-resolution satellite dataset for large-scale map construction. Map construction is one of the foundations of the transportation industry, such as navigation and autonomous driving. Extracting road structures from satellite images is an efficient way to construct large-scale maps. However, existing satellite datasets provide only coarse semantic-level labels with a relatively low resolution (up to level 19), impeding the advancement of this field. In contrast, the proposed OpenSatMap (1) has fine-grained instance-level annotations; (2) consists of high-resolution images (level 20); (3) is currently the largest one of its kind; (4) collects data with high diversity. Moreover, OpenSatMap covers and aligns with the popular nuScenes dataset and Argoverse 2 dataset to potentially advance autonomous driving technologies. By publishing and maintaining the dataset, we provide a high-quality benchmark for satellite-based map construction and downstream tasks like autonomous driving.


ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

Neural Information Processing Systems

Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging. Existing benchmarks are either manually constructed or are automatic, but lack the ability to evaluate the thought process of LLMs with arbitrary complexity. We contend that utilizing existing relational databases based on the entity-relationship (ER) model is a promising approach for constructing benchmarks as they contain structured knowledge that can be used to question LLMs. Unlike knowledge graphs, which are also used to evaluate LLMs, relational databases have integrity constraints that can be used to better construct complex in-depth questions and verify answers: (1) functional dependencies can be used to pinpoint critical keywords that an LLM must know to properly answer a given question containing certain attribute values; and (2) foreign key constraints can be used to join relations and construct multi-hop questions, which can be arbitrarily long and used to debug intermediate answers. We thus propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark. ERBench supports continuous evaluation as databases change, multimodal questions, and various prompt engineering techniques. In our experiments, we construct LLM benchmarks using databases of multiple domains and make an extensive comparison of contemporary LLMs. We show how ER-Bench can properly evaluate any LLM by not only checking for answer correctness, but also effectively verifying the rationales by looking for the right keywords.


Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Neural Information Processing Systems

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. In this work, we propose an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior.


Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Chae Won Kim KAIST

Neural Information Processing Systems

The rapid development of large language and vision models (LLVMs) has been driven by advances in visual instruction tuning. Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs. These advancements are attributed to multifaceted information required for diverse capabilities, including fundamental image understanding, real-world knowledge about common-sense and non-object concepts (e.g., charts, diagrams, symbols, signs, and math problems), and step-by-step procedures for solving complex questions. Drawing from the multifaceted information, we present a new efficient LLVM, Mamba-based traversal of rationales (Meteor), which leverages multifaceted rationale to enhance understanding and answering capabilities. To embed lengthy rationales containing abundant information, we employ the Mamba architecture, capable of processing sequential data with linear time complexity. We introduce a new concept of traversal of rationale that facilitates efficient embedding of rationale. Subsequently, the backbone multimodal language model (MLM) is trained to generate answers with the aid of rationale. Through these steps, Meteor achieves significant improvements in vision language performances across multiple evaluation benchmarks requiring diverse capabilities, without scaling up the model size or employing additional vision encoders and computer vision models.


3dbb8b6b5576b85afb3037e9630812dc-Paper-Conference.pdf

Neural Information Processing Systems

The reliability of driving perception systems under unprecedented conditions is crucial for practical usage. Latest advancements have prompted increasing interest in multi-LiDAR perception. However, prevailing driving datasets predominantly utilize single-LiDAR systems and collect data devoid of adverse conditions, failing to capture the complexities of real-world environments accurately. Addressing these gaps, we proposed Place3D, a full-cycle pipeline that encompasses Li-DAR placement optimization, data generation, and downstream evaluations. Our framework makes three appealing contributions. 1) To identify the most effective configurations for multi-LiDAR systems, we introduce the Surrogate Metric of the Semantic Occupancy Grids (M-SOG) to evaluate LiDAR placement quality.


Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Neural Information Processing Systems

Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Instead of relying on the multimodal LLM to directly annotate data, which we found to be suboptimal, we prompt it to reason about potential candidate entity labels by accessing additional contextually relevant information (such as Wikipedia), resulting in more accurate annotations. We further use the multimodal LLM to enrich the dataset by generating question-answer pairs and a grounded finegrained textual description (referred to as "rationale") that explains the connection between images and their assigned entities. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks (e.g.


Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems

Neural Information Processing Systems

Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformerbased model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS - network-scale traffic signal control system in the open world - show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets.


Transportation Department deploying artificial intelligence to spot air traffic dangers, Duffy says

FOX News

Fox News chief Washington correspondent Mike Emanuel has the latest on Transportation Secretary Sean Duffy's statements about recent air traffic control incidents on'Special Report.' Transportation Secretary Sean Duffy recently announced that artificial intelligence (AI) is being used to detect and address air traffic risks, following a slew of near-misses and fatal plane crashes across the country. Duffy told FOX 5 DC that officials are implementing AI to "identify and address potential air traffic risks nationwide," potentially aiding in preventing tragedies like the fatal Jan. 29 midair collision at Ronald Reagan Washington National Airport (DCA) that claimed the lives of 67 people. Following the Potomac River crash, which involved a commercial plane and an Army Black Hawk helicopter, Duffy announced a plan to build a new "state-of-the-art" traffic control system that will equip locations with better technology to reduce outages, improve efficiency and reinforce safety. Duffy told FOX 5 that when investigators were looking into how to prevent collisions, they asked themselves, "Are there any other DCAs out there?" Transportation Secretary Sean Duffy speaks during a news conference following up on the issuance of the National Transportation Safety Board preliminary report on the mid-air collision near Ronald Reagan Washington National Airport, on Tuesday, March 11.


Supplementary Material for " AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control "

Neural Information Processing Systems

Appendix A includes the details of all numerical experiments. In Appendix A.1, we describe our traffic data and how we generate the synthetic data. Appendices A.2 and A.3 explains the AttendLight model as well as the RL training that we consider throughout the paper. Appendix A.4 gives a brief explanation on our baseline algorithms. Finally, in Appendix A.5, we provide extensive results of AttendLight for single-env and multi-env regimes. A.1 Details of Real-World Data and the Synthetic Data Generation In each intersection, we consider three traffic movement sets, namely straight, turn-left, and turnright.These sets are used in defining the traffic data of an intersection in both real-world and synthetic cases.